-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Reader::scrach_buffer
field + resulting breaking API changes.
#421
Conversation
Hmmm... after having already submitted the PR, I now started to wonder:
|
I would like to eventually make this API change, but for now I wonder if it would be simpler to make |
One motivation for the changes in this PR is to improve the performance of It seems that sometimes decoding row-by-row (or at least smaller-than-frame-chunk-by-chunk) may be desirable if pixels needs some kind of post-processing (e.g. applying gamma, alpha premultiplication, etc.) and the user of the |
Is the main concern the desire to avoid breaking changes? One way to address this concern may be to instead add a new API ( OTOH, IMO some breaking API changes may be required for performance one way or another. So maybe another way to proceed is to wait until those other breaking changes have PRs with measurements that show their impact for the |
The goal is to bundle all the breaking changes we'd like to make into a single These changes generally look good to me though, so I'm fine with letting them linger until we're ready to do a breaking release |
Ack. That sounds good and is totally reasonable. Just as an FYI, let me point out that currently I think that I may want to make the following 3 breaking changes:
I think that it's probably desirable to first flush out 3-4 other PRs with non-breaking changes (starting with the |
f4c1a10
to
7150f4d
Compare
Status update:
|
This commit moves `expand_paletted_into_rgb8` and `expand_paletted_into_rgba8` (and their unit tests) into a separate `transform/palette.rs` module. This prepares room for encapsulating extra complexity in this module in follow-up commits, where we will start to precompute and memoize some data when creating a `TransformFn`. This commit just moves the code around - it should have no impact on correctness or performance.
The `PLTE` chunk's size should be a multiple of 3 (since it contains RGB entries - 3 bytes per entry). Additionally, taking 10000 samples in the `bench_create_fn` benchmarks is a bit excessive after memoization.
This commit changes the `TransformFn` type alias from `fn(...)` into `Box<dyn Fn(...)>`. This allows the `TransformFn` to have store some precomputer, memoized state that we plan to add in follow-up commits. In theory this commit may have negative performance impact, but in the grand scheme of things it disappears into the measurement noise. In particular, when there is no state, then `Box` shouldn't allocate.
Before this commit `expand_paletted_into_rgba8` would: * Perform 2 lookups - `palette.get(i)` and `trns.get(i)` * Check via `unwrap_or` if `i` was within the bounds of `palette`/`trns` This commit introduces `create_rgba_palette` which combines `palette` and `trns` into a fixed-size `[[u8;4]; 256]` look-up table (called `rgba_palette` in the code). After this commit `expand_paletted_into_rgba8` only needs to perform a single look-up and doesn't need to check the bounds. This helps to improve the expansion time by 60+%: - expand_paletted(exec)/trns=yes/src_bits=4/src_size=5461: [-60.208% -60.057% -59.899%] (p = 0.00 < 0.05) - expand_paletted(exec)/trns=yes/src_bits=8/src_size=5461: [-77.520% -77.407% -77.301%] (p = 0.00 < 0.05) `expand_paletted_into_rgb8` performs only a single lookup before and after this commit, but avoiding bounds checks still helps to improve the expansion time by ~12%: - expand_paletted(exec)/trns=no/src_bits=4/src_size=5461: [-12.357% -12.005% -11.664%] (p = 0.00 < 0.05) - expand_paletted(exec)/trns=no/src_bits=8/src_size=5461: [-13.135% -12.584% -12.092%] (p = 0.00 < 0.05) Understandably, this commit regresses the time of `create_transform_fn`. Future commits will reduce this regression 2-4 times: - expand_paletted(ctor)/plte=256/trns=256: [+3757.2% +3763.8% +3770.5%] (p = 0.00 < 0.05) - expand_paletted(ctor)/plte=224/trns=32: [+3807.3% +3816.2% +3824.6%] (p = 0.00 < 0.05) - expand_paletted(ctor)/plte=16/trns=1: [+1672.0% +1675.0% +1678.1%] (p = 0.00 < 0.05)
Before this commit `expand_into_rgb8` would copy 3 bytes at a time into the output. After this commit it copies 4 bytes at a time (possibly cloberring pixels that will be populated during the next iteration - this is ok). This improved the performance as follows: expand_paletted(exec)/trns=no/src_bits=8/src_size=5461 time: [-23.852% -23.593% -23.319%] (p = 0.00 < 0.05)
This improves the performance as follows: - expand_paletted(ctor)/plte=256/trns=256 [-40.581% -40.396% -40.211%] (p = 0.00 < 0.05) - expand_paletted(ctor)/plte=224/trns=32 [-24.070% -23.840% -23.592%] (p = 0.00 < 0.05) Small palettes are mostly unaffected: - expand_paletted(ctor)/plte=16/trns=1 [-0.2525% +0.0338% +0.3239%] (p = 0.81 > 0.05)
Remove remaining uses of miniz_oxide for decoding
…te-usages Remove usages of `extern crate`
Fixes image-rs#357.
This commit is desirable, because it avoids copying of image data across intermediate buffers. This commit was motivated by the data gathered in image-rs#416 (comment) The commit results in the followin performance gains seen in the recently introduced `row-by-row/128x128-4k-idat` benchmark: - time: [-18.401% -17.807% -17.192%] (p = 0.00 < 0.05) - time: [-9.4276% -8.8789% -8.2860%] (p = 0.00 < 0.05) - time: [-12.389% -11.780% -11.181%] (p = 0.00 < 0.05) Fixes image-rs#417
7150f4d
to
29e90cc
Compare
@fintelia, can you PTAL again? Process notesI see that this PR has been rebased on top of https://github.com/image-rs/image-png/tree/next, but this branch is 24 commits behind https://github.com/image-rs/image-png/tree/master and this makes GitHub is quite confused. In particular, it is not quite correct that in this PR “anforowicz wants to merge 26 commits” (as GitHub says at the top) - I just want to merge the 2 commits here: master...anforowicz:image-png:scratch-buffer-removal: b7d0c06 and 29e90cc. So, maybe you want to rebase on top of the I am also not sure what the right process is for landing breaking API changes. I went ahead and bumped up the version to 0.18.0 in Performance ImpactPerformance of This PR adds a benchmark that covers In theory the changes in this PR can also result in a small performance gain in some non-incremental decoding scenarios in Breaking ChangesAt this point this PR is the only breaking change that I plan to do (given that the I note that a bit ago the If the breaking API changes are nevertheless concerning, then we can also keep the old APIs (maybe marking them as
|
I'm supportive of this PR, but I'm too burned out right now to review everything and manage a major release. As far as the |
Ack. My thinking is gradually evolving toward keeping
So, now I am leaning toward abandoning the current PR and introducing a separate, new API instead in WDYT? |
PTAL?
This mostly follows what we've discussed in #417.
If we merge this, then some coordination may be required for releasing a new version of the
png
crate and then updating theimage
crate:image
crate after a new version of thepng
crate is released. OTOH, I am not sure how to get notified when this happens?png
crate. It may be worth waiting until some additional changes get merged:Read
=>BufRead
is on the table - I want to try measuring this after flushing out other changes from the "copy avoidance series of commits")